-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase initialCapacity for HashSet in ExtractDependencies.scala #18219
Conversation
Growing that HashSet seem to take 1.27% of the allocations when compiling dotty itself.
test performance please |
1 similar comment
test performance please |
performance test scheduled: 239 job(s) in queue, 1 running. |
@@ -447,7 +447,7 @@ private class ExtractDependenciesCollector extends tpd.TreeTraverser { thisTreeT | |||
// Avoid cycles by remembering both the types (testcase: | |||
// tests/run/enum-values.scala) and the symbols of named types (testcase: | |||
// tests/pos-java-interop/i13575) we've seen before. | |||
val seen = new mutable.HashSet[Symbol | Type] | |||
val seen = new mutable.HashSet[Symbol | Type](initialCapacity = 128, loadFactor = mutable.HashSet.defaultLoadFactor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nicolasstucki Nice histogram! I wanted to build one too but gave up eventually. How did you do it?
Do your percentages consider the loadFactor
of 75%
? Should I update the initialCapacity
to 64
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def addTypeDependency(tpe: Type)(using Context): Unit = {
val traverser = new TypeDependencyTraverser {
def addDependency(symbol: Symbol) = addMemberRefDependency(symbol)
}
traverser.traverse(tpe)
println("seen: " + traverser.seen.size)
}
def addPatMatDependency(tpe: Type)(using Context): Unit = {
val traverser = new TypeDependencyTraverser {
def addDependency(symbol: Symbol) =
if (!ignoreDependency(symbol) && symbol.is(Sealed)) {
val usedName = symbol.zincMangledName
addUsedName(usedName, UseScope.PatMatTarget)
}
}
traverser.traverse(tpe)
println("seen: " + traverser.seen.size)
}
sbt "clean; scala3-bootstrapped/compile" > sbtOutput.txt
then filter out seen:
from that file and import the numbers in google sheets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not consider the load factor. My percentages only take into account the size of the set. I would go for 64 if we want to minimize those allocations.
64 is enough to cover almost 99% of the cases so it's a good trade-off between memory consumed and allocations count.
test performance please |
performance test scheduled: 239 job(s) in queue, 1 running. |
Performance test finished successfully: Visit https://dotty-bench.epfl.ch/18219/ to see the changes. Benchmarks is based on merging with main (2753a17) |
There are 3 months of non-benchmarked commits #18221 (comment). The performance regression cound have happened before this change. |
These are being benchmarked. It should take less than two weeks. I suggest you wait to have the missing info to check if this PR is responsible for the regression or not. |
Here is my comparison before and after of cpu time (on warmed up scala3-compiler-bootstrapped): so now traversal takes 47% (cpu time) of the phase instead of 52% (more of a memory bottleneck is allocating Node in the hashset) I also noticed on the side that |
Can you trigger another performance test now? |
Im also working on a more thorough optimisation for this phase, but sure lets do the bench |
test performance with #sbt please |
performance test scheduled: 3 job(s) in queue, 1 running. |
Performance test finished successfully: Visit https://dotty-bench.epfl.ch/18219/ to see the changes. Benchmarks is based on merging with main (ca6a80e) |
alternative to #18219 reduces allocations in the phase by 77% (on `scala3-compiler-bootstrapped`), allocations in the whole compilation pipeline are reduced from about 12% to 4% contributed by this phase. Reduced allocations come from: - reusing a single set for the `seen` cache. - use linear probe maps (dotty.tools.dotc.util) avoiding allocation of nodes - no `ClassDependency` allocation each time a dependency exists (many are likely duplicated) performance is also improved due to less expensive hashing (`EqHashMap`), and introduction of `EqHashSet`, also reducing number of times we hash (`add` method on HashSet, and optimised `getOrElseUpdate`). Probably also from reduced heap pressure of allocating. Now the majority of allocation for this phase is by calling into Zinc
closing now that #18403 is merged |
Growing that HashSet seems to take 1.27% of the allocations when compiling dotty itself.